1
超越通用提示的進階
AI011Lesson 7
00:00

透過微調與專用架構實現優化

1. 超越提示詞

雖然「少樣本」提示是強大的起點,但擴展人工智慧解決方案時,通常需要進一步採用 監督式微調。此過程將特定知識或行為直接嵌入模型的權重中。

決策關鍵: 您僅應在回應品質提升與令牌成本降低所帶來的效益,超過所需的計算資源與資料準備成本時,才進行微調。

$成本 = 令牌數 \times 單價$

2. 小型語言模型(SLM)的革命

小型語言模型(SLMs) 是其大型對應模型的高效縮減版(例如:Phi-3.5、Mistral Small)。它們經過高度精選且高品質的資料訓練。

取捨考量: SLMs 提供顯著更低的延遲,並支援邊緣部署(在設備上本地運行),但相較於大型語言模型,它們會犧牲廣泛而通用的「類人」智能。

3. 專用架構

  • 專家混合模型(MoE):一種在保持推理時計算效率的前提下,擴大整體模型規模的技術。僅有部分「專家」會針對每個輸入的標記被激活(例如:Phi-3.5-MoE)。
  • 多模態:設計用於同時處理文字、影像,甚至聲音的架構,將應用場景拓展至文字生成之外(例如:Llama 3.2)。
效率優先順序
應首先嘗試 提示工程 。若無效,再實作 RAG (檢索增強生成)。僅在最後階段作為高階優化手段使用 微調
model_selection.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
Question 1
When does the course recommend proceeding with fine-tuning over prompt engineering?
When the benefits in quality and cost (reduced token usage) outweigh compute effort.
Whenever you need the model to sound more human-like.
As the very first step before trying RAG or prompt engineering.
Only when deploying to an edge device.
Question 2
Which model architecture allows scaling model size while maintaining computational efficiency?
Supervised Fine-Tuning (SFT)
Retrieval-Augmented Generation (RAG)
Mixture of Experts (MoE)
Multimodality
Challenge: Edge Deployment Strategy
Apply your knowledge to a real-world scenario.
You need to deploy a multilingual translation tool that runs locally on a laptop with limited GPU resources.
Task 1
Select the appropriate model family and tokenizer for this multilingual, low-resource task.
Solution:
Mistral NeMo with the Tekken Tokenizer. It is optimized for multilingual text and fits within SLM constraints.
Task 2
Define the deployment framework for high-performance local inference.
Solution:
Use ONNX Runtime or Ollama for local execution to maximize hardware acceleration on the laptop.